An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition

نویسندگان

  • Bhiksha Raj
  • Lorenzo Turicchia
  • Bent Schmidt-Nielsen
  • Rahul Sarpeshkar
چکیده

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Recommended by Stephen Voran We describe an FFT-based companding algorithm for preprocessing speech before recognition. The algorithm mimics tone-to-tone suppression and masking in the auditory system to improve automatic speech recognition performance in noise. Moreover, it is also very computationally efficient and suited to digital implementations due to its use of the FFT. In an automotive digits recognition task with the CU-Move database recorded in real environmental noise, the algorithm improves the relative word error by 12.5% at −5 dB signal-to-noise ratio (SNR) and by 6.2% across all SNRs (−5 dB SNR to +15 dB SNR). In the Aurora-2 database recorded with artificially added noise in several environments, the algorithm improves the relative word error rate in almost all situations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Perceptual Models for Speech, Audio, and Music Processing

New understandings of human auditory perception have recently contributed to advances in numerous areas related to audio, speech, and music processing. These include coding , speech and speaker recognition, synthesis, signal separation , signal enhancement, automatic content identification and retrieval, and quality estimation. Researchers continue to seek more detailed, accurate, and robust ch...

متن کامل

A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition

This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) frontend that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. ...

متن کامل

A Novel Front-end Based on Variable Frame Rate Analysis and Mel-filterbank Output Compensation for Robust ASR

For automatic speech recognition (ASR) systems, robustness in the presence of various types and levels of environmental noise remains an important issue, despite the various advances of recent years. This paper describes a new noise-robust ASR front-end employing a combination of variable frame rate processing based on the sample-by-sample delta energy parameter, Melfilterbank output compensati...

متن کامل

Robust speech recognition in noise: an evaluation using the SPINE corpus

In this paper, methodologies for effective speech recognition are considered along with evaluations of an NRL speech in noise corpus entitled SPINE. When speech is produced in adverse conditions that include high levels of noise, workload task stress, and Lombard effect, new challenges arise concerning how to best improve recognition performance. Here, we consider tradeoffs in (i) robust featur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2007  شماره 

صفحات  -

تاریخ انتشار 2007